Skip to content

fix: sync gateway tokens in init-config to prevent token mismatch after pod restart#23

Merged
thepagent merged 2 commits intomainfrom
fix/gateway-token-mismatch
Mar 28, 2026
Merged

fix: sync gateway tokens in init-config to prevent token mismatch after pod restart#23
thepagent merged 2 commits intomainfrom
fix/gateway-token-mismatch

Conversation

@thepagent
Copy link
Copy Markdown
Owner

@thepagent thepagent commented Mar 27, 2026

Closes #22

The Problem (Issue #22)

On a fresh Helm install, the init container copies openclaw.json from the ConfigMap (which is rendered from .Values.config in values.yaml) onto the PVC. The gateway starts fine.

But after a pod restart, the init container skips the copy (file already exists on PVC), so gateway.remote.token stays stale or unset — while the gateway process always reads its auth token fresh from OPENCLAW_GATEWAY_TOKEN env var.

  Fresh install
  ┌──────────────────────────────────────────────────────┐
  │ init-config                                          │
  │  openclaw.json not on PVC                            │
  │  → copy from ConfigMap (rendered from values.yaml)   │
  └──────────────────────────────────────────────────────┘
           │                          │
           ▼                          ▼
  ┌─────────────────┐        ┌─────────────────┐
  │ gateway process │        │  openclaw CLI   │
  │ auth = PGdODrQd │  ✓ ok  │ remote = PGdODrQd│
  │ (env var)       │        │ (from PVC)      │
  └─────────────────┘        └─────────────────┘

  After pod restart
  ┌──────────────────────────────────────────────────────┐
  │ init-config                                          │
  │  openclaw.json already on PVC → SKIP                 │
  │  gateway.remote.token = (stale / missing)            │
  └──────────────────────────────────────────────────────┘
           │                          │
           ▼                          ▼
  ┌─────────────────┐        ┌─────────────────┐
  │ gateway process │        │  openclaw CLI   │
  │ auth = PGdODrQd │  ✗     │ remote = stale  │
  │ (env var)       │        │ (from PVC)      │
  └─────────────────┘        └─────────────────┘
           └──────── token mismatch ──┘

Why Only remote.token Needs Syncing

The gateway resolves its own auth token with env-first precedence:

  OPENCLAW_GATEWAY_TOKEN (env)  ←── always wins
           │
           ▼
  gateway.auth.token (config)   ←── only used if env var absent

So gateway.auth.token in openclaw.json is irrelevant when OPENCLAW_GATEWAY_TOKEN is set. Only the CLI needs gateway.remote.token in config to know what token to present.

The Fix (This PR)

Run the sync on every pod start, not just first start:

  Every pod start
  ┌─────────────────────────────────────────────────────┐
  │ init-config                                         │
  │  1. copy openclaw.json if not on PVC (unchanged)    │
  │  2. if OPENCLAW_GATEWAY_TOKEN set:                  │
  │       gateway.remote.token = OPENCLAW_GATEWAY_TOKEN │
  └─────────────────────────────────────────────────────┘
           │                          │
           ▼                          ▼
  ┌─────────────────┐        ┌─────────────────┐
  │ gateway process │        │  openclaw CLI   │
  │ auth = PGdODrQd │  ✓ ok  │ remote = PGdODrQd│
  │ (env var)       │        │ (synced)        │
  └─────────────────┘        └─────────────────┘
           └──────────── match ✓ ─────┘

Non-Helm deployments are unaffected — the sync block is guarded by [ -n "$OPENCLAW_GATEWAY_TOKEN" ].

Impact

Deployment Impact
Helm with OPENCLAW_GATEWAY_TOKEN remote.token synced on every pod start
Helm without OPENCLAW_GATEWAY_TOKEN ✅ No change
Non-Helm (binary) ✅ No change

Note: This is a Helm-layer Workaround

The root cause is an asymmetry in the upstream codebase: the gateway reads its auth token from env var, while the CLI reads it from openclaw.json. The proper fix would be for the CLI to also prefer OPENCLAW_GATEWAY_TOKEN over gateway.remote.token in config (env-first on the client side), eliminating the need for any sync logic here.

This PR patches the problem at the Helm layer until upstream adopts that approach. The sync script can be removed from this chart once the upstream CLI supports env-first credential resolution.

…er pod restart

- Fresh install: set only gateway.remote.token (gateway uses OPENCLAW_GATEWAY_TOKEN env var directly)
- Existing install: sync both gateway.auth.token and gateway.remote.token for backward compatibility
- If OPENCLAW_GATEWAY_TOKEN is not set, skip token sync entirely (non-Helm deployments unaffected)

Closes #22
@masami-agent
Copy link
Copy Markdown

Thanks—this looks like a solid fix. A few suggestions to make the behavior clearer and easier to operate long-term:

  1. Define a single source of truth
    For Helm deployments, it’d help to explicitly document that OPENCLAW_GATEWAY_TOKEN (K8s Secret) is the source of truth, and openclaw.json is derived/synced at startup to avoid config/secret drift.

  2. Token rotation expectations
    Since tokens are re-synced on every pod start, rotating the Secret + restarting the pod will apply the new token (great). It may be worth adding a short note in README/upgrade notes that clients/CLI must be updated accordingly because old tokens will stop working.

  3. Avoid persisting secrets to disk (optional/configurable)
    Fresh install behavior (not persisting auth.token) is great. For existing installs where gateway.auth.token is written for backwards compatibility, consider a value flag like gateway.persistTokenInConfig (default false) or at least document the security implication (ensure the config volume/backups/logs don’t leak secrets).

thepagent

This comment was marked as duplicate.

…logic

gateway uses env-first precedence (OPENCLAW_GATEWAY_TOKEN > gateway.auth.token),
so writing auth.token to config is redundant. Remove IS_FRESH distinction and
only sync gateway.remote.token on every pod start.
@thepagent thepagent merged commit e32152b into main Mar 28, 2026
@thepagent thepagent deleted the fix/gateway-token-mismatch branch March 28, 2026 02:18
thepagent added a commit that referenced this pull request Apr 3, 2026
The Secret template stores the gateway token under key
OPENCLAW_GATEWAY_TOKEN, but init-config was reading it with
secretKeyRef.key: token — causing $OPENCLAW_GATEWAY_TOKEN to always
be empty in the init container.

This silently broke the PR #23 sync logic: gateway.remote.token in
openclaw.json on the PVC was never updated after pod restarts, leading
to token mismatch (1008) errors on every agent tool call.

Fixes #34
thepagent added a commit that referenced this pull request Apr 3, 2026
PR #23 only synced gateway.remote.token, but internal sub-processes
such as Telegram Exec Approvals read gateway.auth.token from the PVC
config to connect back to the gateway. After a pod restart the stale
auth.token caused token mismatch (1008) for all sub-processes.

Sync both fields to OPENCLAW_GATEWAY_TOKEN so the PVC config is fully
consistent regardless of which field a sub-process reads.

Fixes #36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: gateway token mismatch after pod restart

3 participants